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Abstract 

We applied genome-wide allele-specific expression analysis of monocytes from 188 samples. Monocytes were purified from 
white blood cells of healthy blood donors to detect c/s-acting genetic variation that regulates the expression of long non- 
coding RNAs. We analysed 8929 regions harboring genes for potential long non-coding RNA that were retrieved from data 
from the ENCODE project. Of these regions, 60% were annotated as intergenic, which implies that they do not overlap with 
protein-coding genes. Focusing on the intergenic regions, and using stringent analysis of the allele-specific expression data, 
we detected robust c/s-regulatory SNPs in 258 out of 489 informative intergenic regions included in the analysis. The cis- 
regulatory SNPs that were significantly associated with allele-specific expression of long non-coding RNAs were enriched to 
enhancer regions marked for active or bivalent, poised chromatin by histone modifications. Out of the IncRNA regions 
regulated by c/s-acting regulatory SNPs, 20% (n = 52) were co-regulated with the closest protein coding gene. We compared 
the identified c/s-regulatory SNPs with those in the catalog of SNPs identified by genome-wide association studies of human 
diseases and traits. This comparison identified 32 SNPs in loci from genome-wide association studies that displayed a strong 
association signal with allele-specific expression of non-coding RNAs in monocytes, with p-values ranging from 6.7x10~ 7 to 
9.5x10~ 89 . The identified c/s-regulatory SNPs are associated with diseases of the immune system, like multiple sclerosis and 
rheumatoid arthritis. 
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Introduction 

Determination of allele-specific gene expression (ASE) levels by 
quantitative genotyping of heterozygous SNPs on the RNA level 
[1], with genome-wide panels of SNPs [2,3] can be used as a guide 
for identifying cw-acting genetic variants that regulate gene 
expression. ASE analysis is more powerful for detecting cis- 
regulated gene expression than the total expression levels of genes 
determined by regular eQTL analysis [4]. The power and 
precision of ASE analysis to detect cit-regulatory SNPs (aV-rSNPs) 
stems from the fact that the differential expression of the two 
alleles of a SNP are measured in the same environment and have 



been exposed to the same environmental conditions in the cells 
from which the RNA was extracted [4], Thus trans acting and 
environmental factors that may affect gene expression are 
controlled for in ASE analysis. To adjust for possible methodo- 
logical differences in the efficiency of the genotyping assay to 
measure the two alleles of a SNP, such as the sequence context of a 
SNP or copy number variations, the relative amounts of the alleles 
measured in DNA, is used as a reference for quantification of the 
allelic expression. Genome-wide ASE analysis by SNP genotyping 
has been applied to map ay-regulation of protein coding genes 
associated with human diseases and traits in lymphoblastoid cell 
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lines [2,5], osteoblasts [6], fibroblasts [5,7], T-cells [8], and 
monocytes [5]. 

Long non-coding RNAs (IncRNAs) are involved in gene 
regulation and other cellular processes. LncRNAs such as XIST 
and TSIX, which are involved in X chromosome inactivation 
[9, 1 0] , are well known. Other well known examples of IncRNAs 
are AIR that suppresses gene expression via hypermethylation and 
HOTAIR that interacts with the Polycomb Repressive Complex 2 
to silence the HOXD locus [1 1]. New functions for IncRNAs are 
continuously being discovered. LncRNAs can affect gene expres- 
sion in many ways, as scaffolds or guides for chromatin 
modifications, as decoys for reducing the amount of transcription 
factors interacting with chromatin, as signaling molecules reflect- 
ing active transcription factor complexes, or as reservoirs for 
microRNAs [12]. The mechanism for regulation of gene 
expression by IncRNAs usually involves formation of RNA-protein 
complexes that influence the gene expression. Recent large studies 
[13,14,15,16,17,18] have found that IncRNAs display positive 
correlations with expression of protein-coding genes in cis, and 
especially with genes that overlap with the antisense strand of a 
IncRNA. By investigating the effect of regulatory SNPs on the 
expression of IncRNAs using traditional microarray-based ex- 
pressed quantiative trait locus (eQTL) analysis of peripheral blood 
cells a recent study identified 112 civ-regulated IncRNAs [19]. 

The aim of our study was to apply ASE-analysis to identify ex- 
acting SNPs that regulate the expression of IncRNAs and to 
highlight previously unknown roles for IncRNAs in human 
complex diseases. We reasoned that ASE-analysis that is highly 
sensitive for detecting ctf-rSNPs would be a powerful tool for 
studying cis-regulation of IncRNAs that are expressed at lower 
levels than protein-coding genes. To our knowledge ASE-analysis 
has not been previously used for studies of IncRNAs. 

We analysed RNA extracted from human monocytes purified 
from white blood cells of 188 healthy blood donors using a 
genome-wide panel of SNPs. Monocytes were selected for analysis 
because they are a relevant cell type for multiple diseases. Our 
analysis included 8929 genomic regions that harbor genes for 
potential long non-coding RNA that were retrieved from the 
GENCODE database [18]. Of these gene regions, 60% were 
annotated as intergenic, and thus they do not overlap with protein- 
coding genes. Using stringent criteria for the identification of SNPs 
that regulate the expression of IncRNAs we identified 8267 as- 
acting regulatory SNPs out of which 3910 are located in intergenic 
regions and 571 of these are located in enhancer regions marked 
as active or poised by histone modifications in monocytes. We also 
compared the indentified SNPs that regulate expression of 
IncRNAs with risk SNPs for complex diseases and traits previously 
identified in genome-wide association studies (GWAS). In this way 
we were able to obtain new functional clues for these disease 
associated GWAS loci. Additionally, we analysed co-expression 
between IncRNA genes and nearby protein coding genes, and 
found that 20% of the intergenic IncRNAs with a ay-rSNP were 
co-regulated with a protein-coding gene. 

Materials and Methods 

Samples 

Circulating monocytes were collected from healthy adult blood 
donors of European origin (n = 188) recruited from the United 
Kingdom National Blood Service Centre in Cambridge, UK as 
part of the Cardiogenics Transcriptomic Study [20]. Volunteer 
donors with a self-reported recent or acute illness were excluded. 
Donors with measured full blood cell count and C-reactive plasma 



protein levels outside the normal ranges were also excluded from 
the study. 

DNA was extracted from peripheral blood leukocytes using the 
guanidine hydrochloride - chloroform method. CD 14+ magnetic 
microbeads (autoMACS Pro, Miltenyi Biotec, Bergisch Gladbach, 
Germany) were used to isolate monocytes from whole blood. RNA 
was extracted from cell pellets of freshly isolated monocytes by 
homogenization with Trizol-reagent (Invitrogen, Paisley, UK), 
chloroform-ethanol extraction and purification using Qiagen 
RNAeasy columns and reagents, followed by on-column DNase 
treatment. cDNA was synthesized using reagents from the 
Ulumina TotalPrep RNA Amplification Kit, except that the 
poly-dT primers were substituted by random decamers (Applied 
Biosystems, Carlsbad, California, US). 

Ethics statement 

All participants in the study provided their written consent after 
a personal meeting with a study nurse. Written consent from each 
participant was recorded by name, date and signature on a 
standardized consent form. Both the consent form and the study 
has been reviewed and approved by the Cambridgeshire 1 
Research Ethics Committee. The Research Ethics Committee is 
independent from the research institute and part of the UK Health 
Research Authority. 

ENCODE IncRNA Regions 

A total of 8929 genomic regions containing IncRNAs were 
retrieved from the GENCODE version 7 database [18]. From 
regions containing several transcripts, only the longest transcript 
was included to avoid analyzing completely overlapping regions. 
Of the IncRNA regions, 5346 were annotated as intergenic, which 
implies that that there is no overlap between the IncRNA and any 
known protein coding gene. The remaining 3583 regions overlap 
with either exons, introns or encompass an entire gene on the 
sense or anti- sense strand. 

Allele-specific expression analysis 

RNA (cDNA) and genomic DNA (gDNA) from the monocyte 
samples were genotyped using the Infinium assay and Human 
1.2 M Duo custom BeadChips (Illumina, San Diego, California, 
USA) as described previously [4] . The genotype data from cDNA 
is used as a quantitative measure for gene expression in the 
calculation of the ASE-levels. The genotypes called in gDNA are 
used for three different purposes i.e. (i) to phase the SNP alleles on 
each chromosome to facilitate ASE-calling using multiple SNPs 
per transcript, (ii) to correct the ASE-levels in cDNA for possible 
allelic bias in gDNA due to sequence context of the SNPs, and (iii) 
to test for associations between potential oV-rSNPs and the ASE- 
levels of IncRNAS in a genomic region. Genotypes were called in 
gDNA using Genome Studio version 2009.2 (Illumina) with a call 
rate of 99% as the threshold for genotype calls for SNPs and 98% 
for samples. SNPs were further filtered on deviations from Hardy- 
Weinberg equilibrium with a p-value cutoff of 10 6 (Chi-squared 
test). One sample with higher than 40% call rate in cDNA were 
removed due to possible DNA contamination. The raw two-colour 
fluorescence signals from the Infinium assay were normalized to 
remove dependency of the allele fractions on the signal intensities 
of the fluorophores using a quadratic function with medians of 
bins for the two fluorescence colors. From the regression, the 
intensities were predicted based on the log 10 values of the raw 
signal, which were used to adjust the allele fraction, see Methods 
SI for further details and equations. ASE levels were calculated for 
each heterozygote SNP as the difference in normalized allele 
fractions between cDNA and gDNA: [Allele l cDNA / (Allele 1 c dna+ 
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Allele2 <: DNA)] - [Allele l g DNA/ (Allele l gDN A+Allele2gDNA)]- Here, 
gDNA, where the two SNP alleles are present in 1 : 1 ratio serves as 
a quantification standard for the relative expression levels of the 
two alleles in RNA. 

A gene region without ASE will obtain an ASE-level close to 0, 
while a gene region with ASE, where the gDNA fraction is 0.5, will 
have an absolute ASE-level 0<X< = 0.5. The software IMPUTE 
2 [21] (v2.1.0) was used to impute missing genotypes and to phase 
the genotypes across IncRNA gene regions in each individual 
sample. As a reference panel prefiltered haplotypes from 
HapMap3 release #2 [22] and 1000 Genomes pilotl [23] 
available at the IMUPUTE 2 website were used. The ASE levels 
were assigned in each individual sample as the average ASE level 
for all heterozygous SNPs within a genomic region (ASE window), 
corresponding to the region of an annotated IncRNA. Windows of 
IncRNAs with less than three informative heterozygous SNPs were 
excluded. The association between SNPs and the ASE levels of 
IncRNAs was assessed by linear regression of the ASE levels in the 
groups of samples with heterozygous SNPs and homozygous SNPs 
[4] . In summary, homozygous SNPs (AA and BB) are in one group 
with an expected ASE-level close to zero, and heterozygous SNPs 
are either AB or BA with a stochastically determined direction 
during phasing. In the case of significant ASE the AB and BA 
group will on average obtain different signs of the ASE-level. SNPs 
with significant associations with allele specific expression are 
denoted as cif-rSNPs (Figure SI). 

All analyses described above were performed and data is 
presented using the NCBI36/hgl8 assembly as reference genome. 

Refseq protein coding regions 

Protein coding regions extracted from the Refseq database were 
analysed to detect SNPs with association signals that overlap with 
those from IncRNAs. The total number of transcripts in Refseq 
(downloaded 12 th of January 2013) was 42797. To retain only 
protein coding genes, all genes annotated as RNA genes in the 
GeneCards database [24] were removed, leaving 38387 tran- 
scripts. To avoid completely overlapping transcripts, 19654 
transcripts that are a subsequence of another transcript were 
removed. As for IncRNAs, only regions that contained at least 
three informative SNPs were included in the analysis, leaving 
10345 protein-coding regions for further analysis. 

Determination of total gene expression levels using the 
genotype data 

The sum of the raw fluorescence signal intensities from both 
alleles of a SNP in cDNA was used as a measure of total gene 
expression in the position of the SNP. The average sum of the SNP 
signals across each IncRNA window and across all samples 
represents the total expression value for each IncRNA window. A 
stringent average signal threshold of 1000 fluorescence units for all 
188 samples was set for the summed signal intensities, see Figure 
S2. 

Enhancer regions 

Monocyte chromatin immunoprecipitation sequencing (ChlP- 
seq) data was retrieved from the Blueprint project [25]. We 
analysed the four samples available in the Blueprint data using two 
chromatin signatures for each sample; H3K27ac, corresponding to 
an active enhancer [26,27] and H3K4mel, corresponding to a 
poised enhancer [28,29]. The peaks supplied in the Blueprint data 
had been called by MACS2 using the standard parameters for 
each signature. For a ChlP-seq peak to be included in our analysis, 
we required a p-value of 10 -5 in at least two out of four 



overlapping peaks. To be counted as an overlapping peak the 
nucleotide overlap was required to be over 50%. This resulted in 
22091 peaks for H3K27ac and 32692 peaks for H3K4mel. 

Co-expression of IncRNA and closest protein coding 
gene 

We investigated to what extent expression of IncRNA genes 
with a cw-rSNP were correlated with gene expression of the closest 
protein coding gene. For this purpose we divided the samples into 
three groups based on three genotype combinations of the most 
significantly associated or-rSNP (AA, AB/BA, BB). For each 
expressed gene region we calculated the average signal intensity 
for both alleles and used this value as a measure of gene expression 
levels. Next, we performed linear regression analysis with the three 
genotype groups as x-values and the average total expression levels 
as y-values in order to obtain a p-value for the co-expression. 

Results 

Allele-specific expression of IncRNA regions 

We explored os-regulation of 8929 genomic regions harboring 
IncRNA from the ENCODE project using allele-specific expres- 
sion (ASE) analysis of 188 RNA samples from human primary 
monocytes. To determine ASE, we genotyped RNA (cDNA) and 
genomic DNA from the monocytes using a genome-wide panel of 
1.2 million SNP markers [4]. To be informative for the detection 
of ASE, a SNP has to be heterozygous in DNA and expressed at a 
detectable level in RNA. Out of the 8929 IncRNA regions, 60% 
were annotated as intergenic. 1298 regions contain at least three 
informative SNPs and out of these 1122 regions (489 intergenic 
regions) showed a signal intensity above background and were 
considered to be expressed in the monocytes and were thus 
included in the ASE analysis. Of the individual SNPs in the 
IncRNA regions, two thirds have an intensity level above 1000 in 
at least 90% of the samples. We found that the mean expression 
levels of the IncRNAs were 1.5-fold lower than the expression 
levels of exons (Figure 1), while the IncRNAs were expressed at a 
1.5-fold and 5.5-fold higher level than intronic and inter protein 
coding gene regions, respectively. 

The definitions of exon and intron boundaries were taken from 
Refseq. As protein coding genes are on average expressed at 
higher levels than IncRNAs, the ASE signal for IncRNA that 
overlap with protein coding genes is hard to distinguish from that 
of a protein-coding gene. Because of this, we focus on the 
intergenic IncRNA regions where the ASE data is most reliable. 

For the ASE analysis, the genotypes of 194530 SNPs located 
within 250 kb upstream and downstream of the IncRNA regions 
were tested for their association with the ASE levels of the 
corresponding expressed IncRNAs, using linear regression, see 
Manhattan plot in Figure 2. A p-value cut-off of p< 10 6 , based 
on the number of tests performed and adjusted for SNPs in high 
LD (>0.9), was used to correct the association signals for multiple 
testing. In addition a threshold of 0.05 for the slope of the 
regression line was applied to exclude SNPs with low effects on 
ASE. This value corresponds to approximately a 20% expression 
difference between the two alleles and was set as a reasonable 
threshold for biological relevance. To avoid false positives with 
inflated p-values, we only retained associations that were based on 
at least four data points for each allele combination for the 
calculation of the p-value [4] (see example regression plots in 
Figure S3). Using these criteria, we detected 258 intergenic (53%) 
IncRNAs with at least one associated oy-rSNP (Table SI). The 
associated intergenic IncRNA regions varied in size from 0.9 kb to 
547 kb. A total of 8267 of the 194530 tested SNPs (4.2%) were 
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Figure 1 . Comparison of total expression levels of regions annotated to intergenic IncRNAs, exons, introns, and intergenic regions. 

The horizontal axes in the panels show bins of fluorescence signals from the genotyping data, summed for both alleles to give a measure for total 
expression. The average expression levels of annotated transcripts were 4900 fluorescence units in exons, 21 00 in introns, 590 in intergenic regions, 
compared to 3300 in the intergenic IncRNA regions that were used in the ASE analysis. The vertical axes show the number of observations in each 
bin. 

doi:1 0.1 371 /journal.pone.01 0261 2.g001 



associated with ASE of a IncRNA region (p-value<10 ) in the 
188 analysed monocyte samples, and for the intergenic IncRNA 
regions 3910 out of 79628 tested SNPs (4.9%) were significantly 
associated oy-rSNPs. 

To allow comparison between ASE of IncRNA genes and 
protein coding Refseq genes, we applied the same procedure for 
ASE analysis to the Refseq gene regions as for the IncRNAs. This 
analysis identified 8604 cw-rSNP out of 108881 tested SNPs (7.9%) 
in 47 1 out of 10345 regions (4.6%) that were associated with ASE 
of a Refseq gene. 



Associated SNPs in enhancer regions 

We identified 57 1 a's-rSNPs located in enhancer regions marked 
as active or poised by histone modifications in monocytes. To 
determine if the cis-rSNPs for IncRNAs are enriched in enhancer 
regions, we calculated the fold enrichment of significantly 
associated ar-rSNPs defined by ASE analysis of intergenic IncRNA 
regions to histone marks for active enhancers (H3K27ac) and 
bivalent, poised enhancers (H3K4mel) in monocytes. The fold 
enrichment was determined by comparing the fraction of 
significantly associated SNPs in enhancer regions with the fraction 
of non-significant (p-value>0.5) SNPs in enhancer regions. Of the 
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Figure 2. Manhattan plot. Manhattan plot with the p-values from ASE association tests between SNPs and IncRNAs on the vertical axis and the 
genomic IncRNA regions analysed in the study on the horizontal axis. The p-value cut-off of 10~ 6 is shown as a grey line. 
doi:1 0.1 371 /journal.pone.01 0261 2.g002 



significantly associated SNPs, 2.1% were located in H3K27ac 
peaks compared to 1.2% for non-significantly associated SNPs, 
which corresponds to a 1.72-fold enrichment (p-value 9.0 xlO -1 , 
Fisher's exact test). For H3K4mel we observed a 1.54-fold 
enrichment (p-value 3.1 x 10 16 , Fisher's exact test) of significantly 
associated SNPs, with 12.5% of the significant SNPs located in the 
enhancer peaks, while this number was 8.1% for non-significant 
SNPs. For protein coding genes the percentages and fold 
enrichment are similar to those for IncRNA for H3K27ac with 
2.2% and 1.5% for significant and non-significant SNPs, 
respectively, and 1.43 fold enrichment (p-value 7.0x10 , Fisher's 
exact test) and less pronounced for H3K4mel with 9.8% and 
8.5% for significant and non-significant SNPs, respectively, and 
1.15 fold enrichment (p-value 3.5x10 4 , Fisher's exact test). This 
result shows that enhancers have a regulatory ru-effect on 
expression of not only protein coding genes, but also IncRNAs. 

Co-expression of IncRNA and closest protein coding 
gene 

Next we investigated whether the IncRNA genes and nearby 
protein coding genes are co-expressed and co-regulated by a cis- 
rSNP. In our data 52 (20%) of the identified intergenic IncRNA 
regions to which a cu-acting regulatory SNP was associated were 
also correlated with the expression level of a protein-coding gene 



at a false discovery rate (FDR) of 10%, suggesting co-regulation of 
the genes, see Table 1. We found that 21 of the oy-rSNPs are 
significantly associated both with ASE of a IncRNA and ASE of 
the closest protein coding gene, which implies that the expression 
of the IncRNA and the protein coding genes are regulated by the 
same os-rSNP. For 19 Of-rSNPs the ASE signal is unique to a 
IncRNA, which could be an indication of involvement of the 
IncRNA in the regulation of the expression of an adjacent protein 
coding gene as has been suggested in previous studies [14,16]. The 
remaining 12 cir-rSNPs had a distance >250 kb to the nearest 
protein coding gene and the ASE signal could therefore not be 
evaluated. 

GWAS lead SNPs overlap with c/'s-rSNP for IncRNAs 

Next we superimposed the cu-rSNP associated with ASE of 
intergenic IncRNAs in monocytes with SNPs identified in GWAS 
of human diseases and traits, listed in the National Human 
Genome Research Institute (NHGRI) GWAS Catalog [30]. The 
GWAS catalog (downloaded 12 th of January 2013) includes 9617 
entries, 641 traits and 7797 unique SNPs. We identified SNPs 
associated with 32 loci high-lighted by GWAS that are also cis- 
rSNP for IncRNAs. The m-rSNP are the same as the GWAS 
associated SNP for 25 of the loci, while seven ew-SNPs are linkage 
disequilibrium (LD) proxies to the lead GWAS SNPs, with five 
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ACTR3 


2.34E-03 


1.50E-25 


5.58E-14 


rs9605146 


22:15462934-15514699 


TPTEP1 


3.68E-16 


5.12E-08 


5.12E-08 


Regulation of IncRNA by c/s-rSNP, but not of protein-coding gene 


rs246105 


5:108600720-108689969 


PJA2 


1 .20E-03 


5.12E-24 


2.83E-06 


rs3862666 


11:60579785-60591583 


CD5 


7.26E-03 


4.08E-08 


6.24E-06 


rs948421 


8:61459701-61591893 


RAB2A 


8.47E-03 


6.08E-30 


9.84E-06 


rs3737813 


1:178162399-178177554 


TOR1AIP2 


2.40E-03 


3.38E-10 


4.38E-05 


rs850942 


12:13044652-13084880 


KIAA1467 


1.01E-02 


9.33E-20 


9.90E-05 


rs269782 


5:139517088-139528554 


CYSTM1 


3.41 E-03 


2.11E-32 


1 .20E-04 


rsl 21 92704 


6:30874410-30906415 


DDR1 


6.61 E-04 


1.85E-37 


1.55E-04 


rs4978941 


9:112401576-112407320 


SVEP1 


3.13E-08 


1.61 E-1 4 


3.30E-03 


rs8041 057 


15:38118804-38146783 


SRP14 


9.96E-04 


1 .60E-28 


4.44E-03 


rs6744457 


2:216119166-216286863 


FN1 


3.71 E-06 


1.85E-10 


9.65E-03 


rsl 01 67593 


2:71083169-71145381 


NAGK 


1.53E-03 


5.13E-43 


3.61 E-02 


rs3809472 


15:43458687-43481812 


SPATA5L1 


1 .24E-07 


2.28E-55 


4.00E-02 


rsl 0982360 


9:116468537-116473850 


C9orf91 


9.63 E-04 


3.25E-24 


1.74E-01 


rsl 887784 


9:116458405-116464475 


C9orf91 


6.83E-05 


2.50E-22 


2.66E-01 


rs628383 


3:152071522-152094779 


CLRN1 


4.74E-34 


2.20E-14 


2.67E-01 


rs2798686 


1:113355832-113417250 


LRIG2 


1 .07E-02 


4.45E-30 


3.36E-01 


rsl 234351 6 


9:122645199-122654702 


PHF19 


2.96E-06 


2.11E-94 


6.18E-01 


rs81 12960 


19:21561436-21697351 


ZNF100 


8.66E-06 


7.45E-09 


8.47E-01 


rs49 16908 


5:87600489-87768258 


TMEM161B 


6.53E-03 


1.30E-28 


8.50E-01 


Unknown mode of regulation 


rsl 870832 


6:89065388-89187471 


CNR1 


7.91 E-1 2 


1.68E-28 


NA 


rsl 7739675 


2:64418705-64422285 


LGALSL 


5.01 E-08 


5.71 E-1 7 


NA 


rs841603 


12:27149351-27205593 


C12orf71 


1.36E-03 


8.20E-28 


NA 


rsl 868841 


8:58568012-58666672 


FAM110B 


3.55E-05 


3.06E-19 


NA 


rsl 348478 


5:119609042-119697096 


PRR16 


1.69E-02 


9.64E-14 


NA 
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Table 1. Cont. 



p-value for 





IncRNA 


Refseq 


p-value for 


p-value for 


ASE of 


SNP 1 


region 


gene 


co-expression 2 


ASE of lincRNA 3 


Refseq gene 4 


rs7046236 


9:70345772-70437996 


TMEM252 


4.83 E-06 


6.12E-24 


NA 


rsl 0772397 


12:11256393-11295570 


PRB3 


1 .04E-28 


1.04E-51 


NA 


rs9423393 


10:5266321-5295165 


AKR1C4 


2.19E-03 


8.29E-41 


NA 


rs928736 


21:33352005-33359159 


OLIG1 


1.31E-10 


1.00E-27 


NA 


rs71 32674 


12:11443582-11530879 


PRB2 


3.28E-33 


2.82E-1 1 


NA 


rsl 7739675 


2:64475858-64534436 


LGALSL 


5.01 E-08 


1.14E-10 


NA 


rsl 0956365 


8:128420701-128474058 


POU5F1B 


3.22E-11 


1.23E-07 


NA 



'All 52 significantly co-expressed IncRNA and protein coding genes with a IncRNA associated c/s-rSNP are listed. 
2 Multiple testing correction using FDR of 10%, 
3 The p-value cut-off for significant ASE is 10~ 6 , 

4 ASE p-value for a Refseq gene is shown for all regions that are co-expressed with a IncRNA and have an overlapping ASE analysis window, NA otherwise. 
doi:1 0.1 371 /journal.pone.01 0261 2.W01 



SNPs having LD=1.0 and two having LD> = 0.88. Table 2 
summarizes the data for overlapping GWAS SNPs and a'r-rSNP 
associated with ASE of IncRNA regions as well as the phenotypes 
to which these SNPs were associated in GWAS. Figure 3 
exemplifies a IncRNA region with association between a SNP 
reported in GWAS and allele-specific expression. For additional 
examples see Figure S4. The association signals highlight 26 
unique traits, out of which 9 are related to the immune system, 
which is not unexpected given that monocytes function in the 
immune system. Notably, three of the «'.f-rSNP that were 
significandy associated with ASE of a IncRNA are located in an 
intron of a protein coding gene, illustrating a direct or indirect 
regulatory function of an intronic SNP. One locus is reported by 
GWAS to be associated with a non-coding gene (genomic region: 
2:19067188-19369296), which we confirm here by a cw-rSNP for 
the same IncRNA gene. Furthermore, when searching for overlaps 
between «'.s-rSNP associated with IncRNA and GWAS SNPs that 
are located in active enhancer regions in monocytes, we detected 
three GWAS SNPs that are located in a ChlP-seq peak for 
H3K4mel, and seven additional SNPs that are in the 2.5 kb 
enhancer regions flanking the histone mark. For H3K27ac no 
overlap between the ew-rSNP and the actual ChlP-seq peak was 
found, but five SNPs were located in the 2.5 kb flanking region. 
The number of overlapping GWAS SNPs - a't-rSNPs that map to 
an enhancer region are enriched at the same level as eit-rSNPs in 
general. For these GWAS SNPs the effect on the enhancer could 
be the functional mechanism. 

For Refseq protein coding regions we found 458 loci where a 
reported GWAS SNP overlaps with a cu-rSNP or a proxy a'j-rSNP 
in high LD (31 SNPs with LD 1.0 and 29 with LD>0.81 (Table 
S2). As for the IncRNAs a high fraction (n= 112) of the traits are 
related to immune diseases. 

Discussion 

The combination of a large number of samples from primary 
monocytes and the sensitive genotyping method for detecting cis- 
regulatory SNPs based on allele-specific gene expression [4] 
renders our study well powered for detecting as-regulatory SNPs 
that affect expression of long non-coding RNAs, which are 
expressed at a lower level than protein-coding genes. In the ASE 
approach transcript are detected using a genome-wide panel of 
SNPs markers, and hence the use of predefined hybridization 
probes for each transcript is circumvented. This unbiased 



detection of expressed transcripts is an advantage of ASE analysis 
over expression microarrays. This advantage is shared by ASE and 
"next" generation transcriptome sequencing (RNA-seq). An 
advantage of RNA-seq compared to ASE analysis using SNP 
genotyping is that alternatively spliced transcripts and strand 
specific gene expression can be detected, provided that the 
sequencing coverage is sufficient. However, ASE analysis using 
RNA-seq suffers from the drawback of being highly affected by the 
reference bias [31,32], although methods that can correct for most 
of the bias have been developed recendy [33,34]. Moreover, lowly 
expressed genes obtain very low coverage in RNA-seq. 

In this study we focused our ASE analysis of monocytes on 5346 
IncRNAs that were annotated by the ENCODE project as 
intergenic without overlap with protein coding genes. Thus we 
ensured that our analysis targeted truly IncRNA genes and was not 
confounded by overlapping protein-coding genes. Using stringent 
criteria for calling ASE, we identified 258 IncRNAs that were 
regulated by at least one cz's-rSNP. We observed correlated 
expression of 20% of the IncRNAs with their neighbouring 
protein-coding genes, and co-regulation of a IncRNA and a 
neighbouring protein coding gene by the same cu-rSNP for 
approximately half of these IncRNA genes. This finding is 
consistent with a recent study of peripheral blood cells, where 
75% of 1 12 IncRNAs that were mapped using traditional eQTL 
analysis were found to be regulated independently of nearby 
protein-coding genes [19]. Because the ASE analysis used in our 
study only detects cit-regulation, ASE analysis allows the dissection 
of ciy-effects by IncRNAs that influence the transcription of nearby 
genes. If the expression level of a IncRNA and a nearby protein 
coding gene are correlated, and they both show significant ASE 
association with the same aV-rSNP, it is likely that the cz't-rSNP 
directly regulates the protein coding gene. If the expression levels 
between a protein-coding and IncRNA gene are correlated, but 
there is no shared association with a cw-SNP, the IncRNA may be 
a regulatory factor that affects the expression of the adjacent 
protein coding gene. In our ASE data set we found that these two 
putative mechanisms occur at similar frequencies. However, using 
the ASE approach, trans-acting regulatory mechanisms for co- 
expressed genes would remain undetected. A recent study used 
traditional eQTL analysis of monocytes found several trans- 
regulated modules of co-expressed protein coding genes. However, 
IncRNA expression was not addressed in this study [35] . 



PLOS ONE | www.plosone.org 



7 



July 2014 | Volume 9 | Issue 7 | e102612 



Gs-Regulation of Long Non-Coding Transcripts 



Table 2. SNPs associated with allele-specific expression of IncRNA windows with published trait- or disease-associations from 
genome-wide association studies. 







ASE 








GWAS 




Genomic 
region of 
IncRNA 


Distance to 
GWAS 
SNP (bp) 1 


In or close 
to histone 
mark 


ASE 
p-value 


Slope 2 


SNP 3 


Trait 4 


1: 22224268-22229983 


16228 


H3K4m1 5 


2.41 E-7 


0.13 


rs2501276 


Immune response 
to smallpox 
vaccine (IL-6) 


1: 22224268-22229983 


128475 




3.03E-7 


0.07 


rsl 6826658 


Endometriosis 


1: 41480834-41523120 


177376 




2.66E-12 


0.09 


rs6686842 6 


Height 


1: 220153613-220218488 


13083 


H3K4m1 


8.46E-17 


0.22 


rs6687758 


Colorectal 
cancer 


2: 19067188-19369296 


68959 




1.36E-7 


0.06 


rs 1876040 


Cognitive test 
performance 


2: 101944950-101970061 


59999 




1.32E-7 


0.08 


rs2310173 


Ankylosing 
spondylitis 


2: 166359613-166410847 


68123 




5.60E-9 


0.06 


rs6710518 


Bone mineral 
density 


2: 166359613-166410847 


50321 




1.82E-9 


0.06 


rs1346004 7 


Bone mineral 
density 


3: 157947829-158017517 


91268 




3.72E-25 


0.08 


rsl 2638253 


Multiple 
sclerosis 
(severity) 


6: 28237538-28245351 


184924 


H3K4m1 5 
H3K27ac 5 


2.25E-8 


0.06 


rs6903823 8 


Pulmonary 
function 


6: 30050868-30054162 


7639 


H3K4m1 


5.90E-11 


0.08 


rs3893464 


Graves' 
disease 


6: 29802357-29824805 


257099 


H3K27ac 5 


6.18E-7 


0.08 


rs4313034 9 


Graves' 
disease 


6: 30874410-30906415 


204180 




4.05 E- 12 


0.11 


rs4248154 


Graves' 
disease 


6: 58380311-58395738 


21176 




3.60E-21 


0.07 


rs9500256 10 


Eosinophilic 
esophagitis 
(pediatric) 


7: 1166536-1171429 


161007 




3.40E-14 


0.07 


rsl 0256972 


Longevity 


7: 7261310-7283935 


26354 




4.86E-9 


0.07 


rsl 0259085 


Multiple 
sclerosis 
(severity) 



7: 7261310-7283935 0 H3K4m1 5 3.41E-7 0.06 rs1299548 Amount of visceral 



adipose tissue adjusted 
for body mass 
index (BMI) 



8: 23138679-23144384 


0 


H3K4m1 5 
H3K27ac s 


9.45 E-89 


0.27 


rsl 3278062 


Age-related 

macular degeneration 




9: 122645199-122654702 


25619 


H3K4m1 
H3K27ac 5 


4.09E-16 


0.13 


rs1953126 


Celiac disease and 
Rheumatoid arthritis 




9: 122645199-122654702 


38017 




1.02E-16 


0.13 


rs881375 


Rheumatoid 
arthritis 




9: 122645199-122654702 


75358 


H3K4m1 5 
H3K27ac 5 


4.13E-17 


0.13 


rs3761847 


Rheumatoid 
arthritis 




9: 135120874-135140438 


0 




3.39E-50 


0.43 


rs687621 


D-dimer 
levels 




9: 135120874-135140438 


0 




2.69E-66 


0.43 


rs657152 


Liver enzyme 
levels 




9: 135120874-135140438 


0 




2.69E-66 


0.43 


rs643434 


Inflammatory 
biomarkers 




9: 135120874-135140438 


0 




5.66E-52 


0.44 


rs612169 


Metabolic 
traits 




9: 135120874-135140438 


0 




1 .00E-50 


0.44 


rs505922 


Protein 
quantitative 
trait loci 
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Table 2. Cont. 









ASE 








GWAS 




Genomic 
region of 
IncRNA 


Distance to 
GWAS 
SNP (bp) 


In or close 
to histone 
mark 


ASE 
p-value 


Slope 2 


SNP 3 


Trait 4 


9: 135120874-135140438 


0 




3.80E-22 


0.43 


rs507666 


Soluble 
ICAM-1 


9: 135120874-135140438 


3551 




8.83 E-26 


0.43 


rs579459 


Soluble E-selectin 
levels 


9: 135120874-135140438 


4250 




7.93E-25 


0.43 


rs495828 


Angiotensin-converting 
enzyme activity 


15: 43458687-43481812 


30170 




1.24E-21 


0.14 


rs2453533 11 


Chronic 
kidney disease 


1 5: 82975686-82986699 


245274 




4.86E-13 


0.05 


rs3743162 12 


Alzheimer's 
disease (age 
of onset) 


17: 26060783-26121168 


150673 


H3K4m1 


1.28E-8 


0.06 


rs3760318 


Height 



'The distance to the IncRNA region is 0 if the SNP is located within the region and the smallest distance otherwise, 
2 Slope is given in absolute numbers, 

3 Listed are all cis-rSNPs that are also found in the GWAS catalog together with the associated IncRNA, 

4 The trait is taken from the GWAS catalog, 

5 Within 2.5 kb, 

6 rs6663565 as proxy, 

7 rs2303393 as proxy, 

8 rs69221 1 1 as proxy, 

9 rs7739434 as proxy, 

10 rsl 3214831 as proxy, 

11 rsl 153862 as proxy, 

12 rs1 2442557 as proxy. 

doi:1 0.1 371 /joumal.pone.01 0261 2.W02 



A drawback of our study is that this specific dataset was not 
replicated by an independent method. However, in a previous 
study we have shown that a similar method for genome-wide ASE- 
analysis using NS-12 BeadChips (Illumina) was highly reproduc- 
ible between replicate samples, with correlation coefficients of 
0.9969 for gDNA and 0.9956 for cDNA. ASE-levels determined 
using NS-12 BeadChips and those determined by quantitative 
Sanger sequencing were also strongly correlated (0.86) in nine 



representative genes [3]. Furthermore, in the current study the 
consistency of die ASE-signal for individual SNPs across the 
IncRNA regions is high, with an average standard deviation of 
0.05 over all regions and samples. The consistency of the ASE- 
signals can be observed in Figure S4, where almost all SNPs in the 
IncRNA regions show overexpression of the same allele. 

In our study we found that 32 of the cu-rSNPs that regulate the 
expression of IncRNAs have been identified in GWAS as risk 



GWAS SNP: 1-220231 571 -rs6687758 



220140000 



220160000 



220180000 
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220240000 



60 

30 

0 
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Figure 3. Illustration of a region with a SNP from genome wide association studies (GWAS) which is associated with ASE of IncRNAs. 

The tracks are from top to bottom in each panel: Horizontal red bars represent IncRNA transcript windows (with genomic coordinates) used for 
determination of ASE levels; grey lines show p-values for the association of GWAS SNPs with ASE levels in the transcript window; a grey line overlayed 
with a red dotted line indicates that a c/s-rSNP overlaps with the reported SNP in the GWAS catalog; red vertical lines are median ASE4evels for each 
SNP. 

doi:1 0.1 371 /journal.pone.01 0261 2.g003 
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variants for human diseases or traits. The diseases or phenotypic 
traits associated with these a's-regulatory SNPs according to the 
IncRNA ASE analysis, include diseases of the immune system; 
multiple sclerosis, Graves' disease, eosinophilic esophagitis, celiac 
disease, rheumatoid arthritis, immune response to small pox 
vaccine, soluble E-selectin levels, soluble IGAM-1, and ankylosing 
spondylitis, for which monocytes are a relevant cell type (Table 1). 
Chronic kidney disease is also a relevant trait as it can be caused 
both by the autoimmune disease type 1 diabetes and by 
hypertension, which are diseases for which monocytes are a 
relevant cell type [25]. By manually classifying the traits in the 
GWAS catalog into two classes, ie traits related to the immune 
system, and other traits, we tested for over-representation of cis- 
rSNPs in the GWAS-catalog using Fisher's exact test. For traits 
associated to the immune system, the p-value for overrepresenta- 
tion is 0.00027. The SNP rsl3278062 that confers risk of age- 
related macular degeneration detected in our study of monocytes 
using ASE analysis (Table 1) was also highlighted by traditional 
eQTL analysis of peripheral blood samples [19], while none of the 
other detected risk SNPs or traits overlapped between the two 
studies. 

We also found that the identified of-rSNPs for IncRNA are 
enriched to active enhancer regions in monocytes, which suggests 
a mechanism for their civ-regulatory functions. The reported 32 
risk SNPs from GWAS that are strongly associated with the 
expression of non-coding RNAs provides interesting leads for 
further characterization and functional clues into immune 
diseases. Some of the GWAS SNPs are located in enhancer 
regions, which could be the cause of the allele specific expression 
of the IncRNA. Thus our study suggests more complex functional 
mechanisms underlying findings from GWAS than regulatory 
variants or expression levels of nearby protein coding genes, and 
provides novel insights into the relationship between genetic 
variation and human diseases. 

Supporting Information 

Figure SI Schematic picture of the principles ASE 
analysis. In ASE the allele-specific expression level is measured 
by the difference in fluorophore signal intensity between the two 
alleles in the same sample. The average ASE-level is calculated for 
all heterozygous SNPs in the region. This ASE-value is then used 
in an association test against the genotypes of Of-rSNPs (shown in 
yellow). Figure adapted from Almlof et al [4]. 
(PDF) 

Figure S2 Signal intensity threshold. The figure shows the 
distribution of signal intensities for the two alleles. The x-channel 
represents the A-allele and the y-channel represents the B-allele. In 
the xraw panel the fraction of SNPs with the BB genotype having 
intensity levels above the cutoff level of 1000 is very low and 
similarly for the yraw panel. 
(PDF) 

Figure S3 Regression lines. Regression lines for the 32 risk 
SNPs from GWAS that are significandy associated with a IncRNA 
region. 
(PDF) 

Figure S4 IncRNA in the GWAS catalog. Illustration of all 
IncRNA regions that have a significant association to a SNP that is 
also significantly associated in the GWAS catalog. The tracks are 
from top to bottom in each panel: Horizontal red bars represent 
transcript windows (with genomic coordinates) used for determi- 
nation of ASE levels; grey lines show p-values for the association of 
GWAS SNPs with ASE levels in the transcript window; a grey line 



overlayed with a red dotted line indicates that this is the SNP that 
overlaps with the reported SNP in the GWAS catalog; red vertical 
lines are median ASE-levels for each SNP; annotated transcripts 
are shown in black below the tracks. 
(PDF) 

Table SI Significantly associated IncRNA regions. 

(DOCX) 

Table S2 SNPs associated with allele-specific expres- 
sion of Refseq protein coding genes with published trait- 
or disease-associations from genome-wide association 
studies. 

(DOCX) 

Methods SI Quadratic normalization. Detailed explana- 
tion of the quadratic normalization performed on the intensity 
levels obtained from the genotyping. 
(DOCX) 
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